Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams
نویسندگان
چکیده
منابع مشابه
Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study
This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the “Manhattan distance”, and Dice’s measure of similarity. The Dice measure was used for comparison purposes. Results show that N-gram text classification using the Dice measure outperforms classification using the Manhattan measure.
متن کاملSentiment Classification over Opinionated Data Streams Through Informed Model Adaptation
Opinionated data streams are very popular data paradigms nowadays as more and more users share their opinions online about almost everything from products to persons, brands and ideas. One of the key challenges for opinionated stream mining is dealing with concept drifts in the underlying stream population by building learners that adapt to such concept changes. Ageing is a typical way of adapt...
متن کاملApproximate Frequency Counts over Data Streams
Research in data stream algorithms has blossomed since late 90s. The talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research. The talk will also touch upon a recent development: analysis of personal data streams for improving our quality of lives. 1. BIOGRAPHICAL SKETCHES Gurmeet Manku (1973-) is a software engi...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملTwo-step Feature Selection Algorithm Based on N-gram Representation in Chinese Text Classification
Usually, there are two steps in the construction of an automated text classification system. The first step is that the texts are coded into a representation more suitable for the learning algorithm. There are various ways of representing a text such as by using word fragments, words, phrases, meanings, and concepts [82]. Different text representations have different dependence on the language ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Applied Mathematics and Statistics
سال: 2018
ISSN: 2297-4687
DOI: 10.3389/fams.2018.00041